Unsupervised Learning of Morphology for English and Inuktitut
نویسندگان
چکیده
We describe a simple unsupervised technique for learning morphology by identifying hubs in an automaton. For our purposes, a hub is a node in a graph with in-degree greater than one and out-degree greater than one. We create a word-trie, transform it into a minimal DFA, then identify hubs. Those hubs mark the boundary between root and suffix, achieving similar performance to more complex mixtures of techniques.
منابع مشابه
The acquisition of ergativity in Inuktitut*
One potential challenge for children learning Inuktitut comes from the ergative case marking system, because of the contrast between the ergative system in morphology and the accusative system governing syntax. However, no studies have yet been published focusing on how Inuktitut-speaking children acquire ergativity. In this chapter, we investigate this process using naturalistic spontaneous sp...
متن کاملUnsupervised Learning by Program Synthesis
We introduce an unsupervised learning algorithm that combines probabilistic modeling with solver-based techniques for program synthesis. We apply our techniques to both a visual learning domain and a language learning problem, showing that our algorithm can learn many visual concepts from only a few examples and that it can recover some English inflectional morphology. Taken together, these res...
متن کاملSemi-Supervised Learning of Concatenative Morphology
We consider morphology learning in a semi-supervised setting, where a small set of linguistic gold standard analyses is available. We extend Morfessor Baseline, which is a method for unsupervised morphological segmentation, to this task. We show that known linguistic segmentations can be exploited by adding them into the data likelihood function and optimizing separate weights for unlabeled and...
متن کاملAligning and Using an English-Inuktitut Parallel Corpus
A parallel corpus of texts in English and in Inuktitut, an Inuit language, is presented. These texts are from the Nunavut Hansards. The parallel texts are processed in two phases, the sentence alignment phase and the word correspondence phase. Our sentence alignment technique achieves a precision of 91.4% and a recall of 92.3%. Our word correspondence technique is aimed at providing the broades...
متن کاملUnsupervised Learning of Morphology Using a Novel Directed Search Algorithm: Taking the First Step
This paper describes a system for the unsupervised learning of morphological suffixes and stems from word lists. The system is composed of a generative probability model and a novel search algorithm. By extracting and examining morphologically rich subsets of an input lexicon, the search identifies highly productive paradigms. Quantitative results are shown by measuring the accuracy of the morp...
متن کامل